General and Robust Communication-Efficient Algorithms for Distributed Clustering
نویسندگان
چکیده
As datasets become larger and more distributed, algorithms for distributed clustering have become more and more important. In this work, we present a general framework for designing distributed clustering algorithms that are robust to outliers. Using our framework, we give a distributed approximation algorithm for k-means, k-median, or generally any `p objective, with z outliers and/or balance constraints, using O(m(k+ z)(d+ log n)) bits of communication, where m is the number of machines, n is the size of the point set, and d is the dimension. This generalizes and improves over the previous work of Bateni et al. [12] and Malkomes et al. [31]. As a special case, we achieve the first distributed algorithm for k-median with outliers, answering an open question posed by Malkomes et al. [31]. For distributed k-means clustering, we provide the first dimension-dependent communication complexity lower bound for finding the optimal clustering. This improves over the lower bound of Chen et al. which is dimension-agnostic [18]. Furthermore, we give distributed clustering algorithms which return nearly optimal solutions, provided the data satisfies the approximation stability condition of Balcan et al. [8] or the spectral stability condition of Kumar and Kannan [27]. In certain clustering applications where each machine only needs to find a clustering consistent with the global optimum, we show that no communication is necessary if the data satisfies approximation stability. ∗Authors’ addresses: [email protected], [email protected], [email protected]. This work was supported in part by NSF grants CCF-1422910, CCF-1535967, IIS-1618714, a Sloan Research Fellowship, a Microsoft Research Faculty Fellowship, a Google Research Award, and a National Defense Science & Engineering Graduate (NDSEG) fellowship. ar X iv :1 70 3. 00 83 0v 1 [ cs .D S] 2 M ar 2 01 7
منابع مشابه
MLCA: A Multi-Level Clustering Algorithm for Routing in Wireless Sensor Networks
Energy constraint is the biggest challenge in wireless sensor networks because the power supply of each sensor node is a battery that is not rechargeable or replaceable due to the applications of these networks. One of the successful methods for saving energy in these networks is clustering. It has caused that cluster-based routing algorithms are successful routing algorithm for these networks....
متن کاملMulti-layer Clustering Topology Design in Densely Deployed Wireless Sensor Network using Evolutionary Algorithms
Due to the resource constraint and dynamic parameters, reducing energy consumption became the most important issues of wireless sensor networks topology design. All proposed hierarchy methods cluster a WSN in different cluster layers in one step of evolutionary algorithm usage with complicated parameters which may lead to reducing efficiency and performance. In fact, in WSNs topology, increasin...
متن کاملAn improved opposition-based Crow Search Algorithm for Data Clustering
Data clustering is an ideal way of working with a huge amount of data and looking for a structure in the dataset. In other words, clustering is the classification of the same data; the similarity among the data in a cluster is maximum and the similarity among the data in the different clusters is minimal. The innovation of this paper is a clustering method based on the Crow Search Algorithm (CS...
متن کاملA Robust Distributed Estimation Algorithm under Alpha-Stable Noise Condition
Robust adaptive estimation of unknown parameter has been an important issue in recent years for reliable operation in the distributed networks. The conventional adaptive estimation algorithms that rely on mean square error (MSE) criterion exhibit good performance in the presence of Gaussian noise, but their performance drastically decreases under impulsive noise. In this paper, we propose a rob...
متن کاملDistributed k-Means and k-Median Clustering on General Topologies
This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following a classic approach in clustering by [13], we reduce the problem of finding a clustering with low cost to the problem of finding a coreset of small size. We p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1703.00830 شماره
صفحات -
تاریخ انتشار 2017